23 research outputs found
Deep Unsupervised Clustering Using Mixture of Autoencoders
Unsupervised clustering is one of the most fundamental challenges in machine learning. A popular hypothesis is that data are generated from a union of low-dimensional nonlinear manifolds; thus an approach to clustering is identifying and separating these manifolds. In this paper, we present a novel approach to solve this problem by using a mixture of autoencoders. Our model consists of two parts: 1) a collection of autoencoders where each autoencoder learns the underlying manifold of a group of similar objects, and 2) a mixture assignment neural network, which takes the concatenated latent vectors from the autoencoders as input and infers the distribution over clusters. By jointly optimizing the two parts, we simultaneously assign data to clusters and learn the underlying manifolds of each cluster.Part of this work was done when Dejiao Zhang was doing an internship at Technicolor Research. Both Dejiao Zhang and Laura Balzano’s participations were funded by DARPA-16-43-D3M-FP-037. Both Yifan Sun and Brian Eriksson's participation occurred while also at Technicolor Research.https://deepblue.lib.umich.edu/bitstream/2027.42/145190/1/mixae_arxiv_submit.pdfDescription of mixae_arxiv_submit.pdf : Main tech repor
Learning Dialogue Representations from Consecutive Utterances
Learning high-quality dialogue representations is essential for solving a
variety of dialogue-oriented tasks, especially considering that dialogue
systems often suffer from data scarcity. In this paper, we introduce Dialogue
Sentence Embedding (DSE), a self-supervised contrastive learning method that
learns effective dialogue representations suitable for a wide range of dialogue
tasks. DSE learns from dialogues by taking consecutive utterances of the same
dialogue as positive pairs for contrastive learning. Despite its simplicity,
DSE achieves significantly better representation capability than other dialogue
representation and universal sentence representation models. We evaluate DSE on
five downstream dialogue tasks that examine dialogue representation at
different semantic granularities. Experiments in few-shot and zero-shot
settings show that DSE outperforms baselines by a large margin. For example, it
achieves 13% average performance improvement over the strongest unsupervised
baseline in 1-shot intent classification on 6 datasets. We also provide
analyses on the benefits and limitations of our model.Comment: NAACL 2022 main conferenc
ContraGen: Effective Contrastive Learning For Causal Language Model
Despite exciting progress in large-scale language generation, the
expressiveness of its representations is severely limited by the
\textit{anisotropy} issue where the hidden representations are distributed into
a narrow cone in the vector space. To address this issue, we present ContraGen,
a novel contrastive learning framework to improve the representation with
better uniformity and discrimination. We assess ContraGen on a wide range of
downstream tasks in natural and programming languages. We show that ContraGen
can effectively enhance both uniformity and discrimination of the
representations and lead to the desired improvement on various language
understanding tasks where discriminative representations are crucial for
attaining good performance. Specifically, we attain relative improvement
on the Semantic Textual Similarity tasks and on Code-to-Code Search
tasks. Furthermore, by improving the expressiveness of the representations,
ContraGen also boosts the source code generation capability with relative
improvement on execution accuracy on the HumanEval benchmark.Comment: 10 page
Extracting Compact Knowledge From Massive Data
Over the past couple decades, we have witnessed a huge explosion in data generation from almost every perspective on our lives. Along with such huge volumes of data come more complex models, e.g., deep neural networks (DNNs). This increase in complexity demands new trends in both modeling and analysis of data, among which low dimensionality and sparsity lie at the core. In this thesis, we follow this avenue to address some problems and challenges raised by modern data and models.
High-dimensional data are often not uniformly distributed in the feature space, but instead they lie in the vicinity of a low dimensional subspace. Identifying such low-dimensional structures cannot only give better interpretability of the data, but also significantly reduce the storage and computation costs for algorithms that deal with the data. The second chapter of this thesis focuses on low-rank linear subspace models, and we particularly focus on improving and analyzing an efficient subspace estimation method in the context of streaming data with emphasis on data being undersampled.
On the other hand, real word data are in general non-linear and involve much more complex dependencies, which motivates the development of DNNs. With massive amounts of data and computation power, the high capacity and the hierarchical structure of DNNs allow them to learn extremely complex non-linear dependencies and features. However, the successes achieved by DNNs are marred by the inscrutability of models, poor generalizability, and high demands on data and computational resources, especially given that the size and the complexity of DNNs keeps increasing. To combat these challenges, we specifically focus on two perspectives, model compression and disentangled representation learning.
DNNs are often over-parameterized with many parameters being redundant and non-critical, hence successfully removing these connections is expected to improve both efficiency and generalization. In Chapter III, we go a step further by presenting a new method for compressing DNNs, which encourages sparsity while simultaneously identifying strongly correlated neurons and setting the corresponding weights to a common value. The ability of our method to identify correlations within the network not only helps further reduce the complexity of DNNs, but also allows us to cope with and gain more insights on the highly correlated neurons instead of being negatively affected by them.
From another perspective, many believe that the poor generalization and interpretability of DNNs can be resolved if the model can, in the setting of unsupervised learning, identify and separate out the underlying explanatory factors of data into different factors of its learned representation. Such representations are more likely to be used across a variety of tasks, with each particular task being relevant with a different subset or combination of all representation factors. In Chapter IV, we present an information theoretic approach for jointly learning a hybrid discrete-continuous representation, where the goal is to uncover the underlying categories of data while simultaneously separating the continuous representation into statistical independent components with each encoding a specific variation in data.PHDElectrical Engineering: SystemsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/151479/1/dejiao_1.pd
Convergence of a Grassmannian Gradient Descent Algorithm for Subspace Estimation From Undersampled Data
Subspace learning and matrix factorization problems have great many applications in science and engineering, and efficient algorithms are critical as dataset sizes continue to grow. Many relevant problem formulations are non-convex, and in a variety of contexts it has been observed that solving the non-convex problem directly is not only efficient but reliably accurate. We discuss convergence theory for a particular method: first order incremental gradient descent constrained to the Grassmannian. The output of the algorithm is an orthonormal basis for a -dimensional subspace spanned by an input streaming data matrix. We study two sampling cases: where each data vector of the streaming matrix is fully sampled, or where it is undersampled by a sampling matrix with . Our results cover two cases, where is Gaussian or a subset of rows of the identity matrix. We propose an adaptive stepsize scheme that depends only on the sampled data and algorithm outputs. We prove that with fully sampled data, the stepsize scheme maximizes the improvement of our convergence metric at each iteration, and this method converges from any random initialization to the true subspace, despite the non-convex formulation and orthogonality constraints. For the case of undersampled data, we establish monotonic expected improvement on the defined convergence metric for each iteration with high probability.http://deepblue.lib.umich.edu/bitstream/2027.42/171760/4/GrouseCS-Feb2022.pdfSEL